-
Notifications
You must be signed in to change notification settings - Fork 4.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Only push latest snapshot hashes in SnapshotGossipManager #31154
Only push latest snapshot hashes in SnapshotGossipManager #31154
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Calling out a few spots that I think may be helpful. It also may be beneficial to read the old code and new code on their own, separate from the diff (as the GitHub diff doesn't really do a good job of matching up the code, IMO).
max_full_snapshot_hashes: usize, | ||
max_incremental_snapshot_hashes: usize, | ||
full_snapshot_hashes: FullSnapshotHashes, | ||
incremental_snapshot_hashes: IncrementalSnapshotHashes, | ||
legacy_full_snapshot_hashes: FullSnapshotHashes, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Eventually both of these fields will go away. The exist currently to support LegacySnapshotHashes
.
@@ -26,120 +24,153 @@ impl SnapshotGossipManager { | |||
pub fn new( | |||
cluster_info: Arc<ClusterInfo>, | |||
max_full_snapshot_hashes: usize, | |||
max_incremental_snapshot_hashes: usize, | |||
_max_incremental_snapshot_hashes: usize, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've left this parameter here to make the diff smaller. I have a subsequent PR that removes it.
let Some(starting_snapshot_hashes) = starting_snapshot_hashes else { | ||
return; | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function was originally coded before the let-else
chains were added to Rust. I find this impl to be more readable. Also, it'll make the next PR where I move this fn into new()
a bit more obvious.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Anything that reduces nesting gets a thumbs up from me
assert_eq!( | ||
base_slot, latest_full_snapshot_hash.0, | ||
"the incremental snapshot's base slot ({}) must match the latest full snapshot hash's slot ({})", | ||
base_slot, latest_full_snapshot_hash.0, | ||
); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This assert is moved, not deleted.
(Now is in update_incremental_snapshot_hash()
, which is called by push_incremental_snapshot_hash
.)
fn update_latest_full_snapshot_hash(&mut self, full_snapshot_hash: (Slot, SnapshotHash)) { | ||
self.latest_snapshot_hashes = Some(LatestSnapshotHashes { | ||
full: full_snapshot_hash, | ||
incremental: None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we've gotten a new full snapshot, we know the cannot be any incremental snapshots yet (based on this full snapshot).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This might be nice as a comment in the code
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Opened #31171 to handle this request.
let latest_snapshot_hashes = self | ||
.latest_snapshot_hashes | ||
.as_mut() | ||
.expect("there must already be a full snapshot hash"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And the other side of the same coin is here. In order to get a new incremental snapshot, there must already be a full snapshot to base the incremental on.
); | ||
latest_snapshot_hashes.incremental = Some(incremental_snapshot_hash); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we only care about the latest snapshots, we don't need to push to a vector and then trim the size. Just directly set the new value.
// Check to see what changed in `push_snapshot_hashes()` and handle the new | ||
// error condition here. | ||
/// Push the latest snapshot hashes to the cluster via CRDS | ||
fn push_latest_snapshot_hashes_to_cluster(&self) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The impl of this function looks similar to the bottom half of the old push_incremental_snapshot_hash()
. (It was mostly copied from it 😸 )
|
||
/// Add `full_snapshot_hash` to the vector of full snapshot hashes, then push that vector to | ||
/// the cluster via CRDS. | ||
fn push_legacy_full_snapshot_hash(&mut self, full_snapshot_hash: FullSnapshotHash) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This fn is unchanged from the old push_full_snapshot_hash()
. Just renamed and moved here.
Codecov Report
@@ Coverage Diff @@
## master #31154 +/- ##
=========================================
- Coverage 81.5% 81.5% -0.1%
=========================================
Files 728 729 +1
Lines 206197 206209 +12
=========================================
+ Hits 168210 168219 +9
- Misses 37987 37990 +3 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Couple questions/nits
Problem
Discovery of snapshots in bootstrap is hard (see #30759). The main reason is that full snapshots and incremental snapshots are gossiped out on different CRDS values.
Going forward, we can (read: will) use CRDS
SnapshotHashes
to gossip out the full and incremental snapshots together at the same time.We also need to maintain compatibility with the existing behavior that expects to get full snapshots via
LegacySnapshotHashes
, and incremental snapshots viaSnapshotHashes
(formerly calledIncrementalSnapshotHashes
).Luckily, that's basically what the current code already does!
The bootstrap code requires that the full snapshot info from
LegacySnapshotHashes
and the base snapshot info fromSnapshotHashes
are the same, which is an invariant we will maintain. And here in SnapshotPackagerService we always use the latest full snapshot to populate the incremental snapshot information anyway.Also, bootstrap only cares about the latest snapshots. This is the meat of this change: we only need to push the latest snapshot hashes to the cluster.
Summary of Changes
Refactor the SnapshotGossipManager to only push the latest snapshot hashes (for
SnapshotHashes
). We retain the legacy behavior of keeping all the full snapshots forLegacySnapshotHashes
too.Note: Due to the repurposing of how
SnapshotHashes
is now used, the diff is not the most helpful as the design of SGM has changed. I'll try to point out any tricky and/or big changes.Note 2: It is important that nodes running a version older (i.e.
beta
) are compatible. Specifically, this means that bootstrap with v1.14 must still operate basically the same if its other nodes in the cluster are running with this change.